Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes sure KNNVectorValues aren't recreated unnecessarily when #2133

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

shatejas
Copy link
Collaborator

@shatejas shatejas commented Sep 20, 2024

quantization isn't needed

Description

An attempt to address the regression seen in force-merge time for cohere-10m dataset.

KNNVectorValues creation is optimized to skip for non-quantization cases

The perf for force merge showed improvements in 1m dataset

  number of index segments force merge time (minutes) force merge segments
2.17 code 75 15.68263 3
Current PR 92 12.51252 3

Related Issues

#2134

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@shatejas can you update the changelog

@navneet1v
Copy link
Collaborator

@shatejas is there a way we can validate how many times we are calling those MergedFloatValues in the unit testing?

@shatejas shatejas changed the title Makes sure KNNVectorValues aren't recreated unnecessarily when Optimizes NativeEngines990KNNVectorsWriter to make sure KNNVectorValues are not generated unnecessarily and totalLiveDocs calculation is optimal Sep 23, 2024
@shatejas shatejas force-pushed the force-merge-regression branch from 61d3f71 to 0a280c1 Compare September 23, 2024 19:16
@@ -34,7 +34,8 @@ public byte[] getVector() throws IOException {
@Override
public byte[] conditionalCloneVector() throws IOException {
byte[] vector = getVector();
if (vectorValuesIterator.getDocIdSetIterator() instanceof ByteVectorValues) {
if (vectorValuesIterator instanceof KNNVectorValuesIterator.MergeByteVectorValuesIterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could add a wrapper method to unwrap so that we dont always have to check both

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its trivial currently so will prefer keeping it as it is, if it gets complicated we can either make a predicate or stub it in a method

return getVectorValues(vectorDataType, new KNNVectorValuesIterator.FieldWriterIteratorValues<>(docIdWithFieldSet, vectors));
}

public static <T> KNNVectorValues<T> getVectorValues(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: javadocs for all public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address javadocs with unit test


public final class KNNMergeVectorValues {

public static List<KNNVectorValuesSub<FloatVectorValues>> mergeFloatVectorValues(FieldInfo fieldInfo, MergeState mergeState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc?


public static List<KNNVectorValuesSub<FloatVectorValues>> mergeFloatVectorValues(FieldInfo fieldInfo, MergeState mergeState)
throws IOException {
assert fieldInfo != null && fieldInfo.hasVectorValues();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure on this assert?

Copy link
Collaborator Author

@shatejas shatejas Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece was taken from lucene with minor edits so kept it as it is, any use case that need to be handled considering this is not used for BinaryDocValues?


private static int computeLiveDocs(final DocIdSetIterator values, Bits liveDocs) throws IOException {
int count = 0;
if (liveDocs != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we call length on live docs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length() counts 0s so cannot use that. cardinality() can only be obtained for FixedBitSet and that is handled before calling this

@shatejas shatejas force-pushed the force-merge-regression branch from 0a280c1 to cdcb1eb Compare September 23, 2024 20:28
@shatejas shatejas changed the title Optimizes NativeEngines990KNNVectorsWriter to make sure KNNVectorValues are not generated unnecessarily and totalLiveDocs calculation is optimal Makes sure KNNVectorValues aren't recreated unnecessarily when Sep 23, 2024
@shatejas
Copy link
Collaborator Author

Separated out liveDocs optimization to #2135

quantization isn't needed

Signed-off-by: Tejas Shah <shatejas@amazon.com>
@shatejas shatejas force-pushed the force-merge-regression branch from cdcb1eb to 2d88cce Compare September 23, 2024 20:53
@shatejas shatejas requested a review from jmazanec15 September 23, 2024 21:27
@jmazanec15 jmazanec15 merged commit e33afa5 into opensearch-project:main Sep 23, 2024
31 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 23, 2024
…zation isn't needed (#2133)

Signed-off-by: Tejas Shah <shatejas@amazon.com>
(cherry picked from commit e33afa5)
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.17 2.17
# Navigate to the new working tree
cd .worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-2133-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e33afa5de5f8658ad7fbe71125707436e81cc5b8
# Push it to GitHub
git push --set-upstream origin backport/backport-2133-to-2.17
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-2133-to-2.17.

shatejas added a commit to shatejas/k-NN that referenced this pull request Sep 24, 2024
…zation isn't needed (opensearch-project#2133)

Signed-off-by: Tejas Shah <shatejas@amazon.com>
(cherry picked from commit e33afa5)
shatejas added a commit to shatejas/k-NN that referenced this pull request Sep 24, 2024
…zation isn't needed (opensearch-project#2133)

Signed-off-by: Tejas Shah <shatejas@amazon.com>
(cherry picked from commit e33afa5)
shatejas added a commit to shatejas/k-NN that referenced this pull request Sep 24, 2024
…zation isn't needed (opensearch-project#2133)

Signed-off-by: Tejas Shah <shatejas@amazon.com>
(cherry picked from commit e33afa5)
naveentatikonda pushed a commit that referenced this pull request Sep 27, 2024
…zation isn't needed (#2133) (#2140)

Signed-off-by: Tejas Shah <shatejas@amazon.com>
(cherry picked from commit e33afa5)
naveentatikonda added a commit to naveentatikonda/k-NN that referenced this pull request Sep 27, 2024
naveentatikonda added a commit to naveentatikonda/k-NN that referenced this pull request Sep 27, 2024
…n quantization isn't needed (opensearch-project#2133) (opensearch-project#2140)"

This reverts commit ca6b03f.

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
naveentatikonda added a commit that referenced this pull request Sep 27, 2024
…n quantization isn't needed (#2133) (#2140)" (#2161)

This reverts commit ca6b03f.

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
navneet1v added a commit to navneet1v/k-NN that referenced this pull request Sep 28, 2024
…en quantization isn't needed (opensearch-project#2133) (opensearch-project#2140)" (opensearch-project#2161)

This reverts commit b0d82b7.

Signed-off-by: Navneet Verma <navneev@amazon.com>
@shatejas shatejas deleted the force-merge-regression branch November 27, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants